87 research outputs found
Selective inference after feature selection via multiscale bootstrap
It is common to show the confidence intervals or -values of selected
features, or predictor variables in regression, but they often involve
selection bias. The selective inference approach solves this bias by
conditioning on the selection event. Most existing studies of selective
inference consider a specific algorithm, such as Lasso, for feature selection,
and thus they have difficulties in handling more complicated algorithms.
Moreover, existing studies often consider unnecessarily restrictive events,
leading to over-conditioning and lower statistical power. Our novel and
widely-applicable resampling method addresses these issues to compute an
approximately unbiased selective -value for the selected features. We prove
that the -value computed by our resampling method is more accurate and more
powerful than existing methods, while the computational cost is the same order
as the classical bootstrap method. Numerical experiments demonstrate that our
algorithm works well even for more complicated feature selection methods such
as non-convex regularization.Comment: The title has changed (The previous title is "Selective inference
after variable selection via multiscale bootstrap"). 23 pages, 11 figure
Functional Factorial K-means Analysis
A new procedure for simultaneously finding the optimal cluster structure of
multivariate functional objects and finding the subspace to represent the
cluster structure is presented. The method is based on the -means criterion
for projected functional objects on a subspace in which a cluster structure
exists. An efficient alternating least-squares algorithm is described, and the
proposed method is extended to a regularized method for smoothness of weight
functions. To deal with the negative effect of the correlation of coefficient
matrix of the basis function expansion in the proposed algorithm, a two-step
approach to the proposed method is also described. Analyses of artificial and
real data demonstrate that the proposed method gives correct and interpretable
results compared with existing methods, the functional principal component
-means (FPCK) method and tandem clustering approach. It is also shown that
the proposed method can be considered complementary to FPCK.Comment: 39 pages, 17 figure
Selective Inference for Testing Trees and Edges in Phylogenetics
Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees and edges at the same time. The newly proposed selective inference p-value is useful for testing selected edges to claim that they are significantly supported if p > 1−α, whereas the non-selective p-value is still useful for testing candidate trees to claim that they are rejected if p < α. The selective p-value controls the type-I error conditioned on the selection event, whereas the non-selective p-value controls it unconditionally. The selective and non-selective approximately unbiased p-values are computed from two geometric quantities called signed distance and mean curvature of the region representing tree or edge of interest in the space of probability distributions. These two geometric quantities are estimated by fitting a model of scaling-law to the non-parametric multiscale bootstrap probabilities. Our general method is applicable to a wider class of problems; phylogenetic tree selection is an example of model selection, and it is interpreted as the variable selection of multiple regression, where each edge corresponds to each predictor. Our method is illustrated in a previously controversial phylogenetic analysis of human, rabbit and mouse
- …